-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fs-119/Using file search tool for report #56
Conversation
157c2ce
to
dc2790d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Some minor changes
backend/src/agents/report_agent.py
Outdated
return await self.llm.chat_with_file( | ||
self.model, | ||
system_prompt=system_prompt, | ||
user_prompt=user_prompt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently I have started just initialising user_prompt and ssytem_prompt directly as an arg in the chat calls. I don't know why we split these out as separate variables, don't think it adds much.
|
||
topics = await get_materiality_agent().list_material_topics(company_name) | ||
|
||
report = await get_report_agent().create_report(file["content"], topics) | ||
logger.info(f"Topics are: {topics}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you also finding that openAI is not logging the LLM calls which should be showing the responses here? I am sure our openai chat_with_file
function has logger.info
on these requests which should show us the results, but they aren't appearing in the logs for me. I would be tempted to remove these logs you've added here and raise a bug to figure out why the logs aren't happening in chat_with_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've raised https://scottlogic.atlassian.net/browse/FS-136 to capture this bug
backend/src/utils/file_utils.py
Outdated
) | ||
|
||
file_stream = BytesIO(file_bytes) | ||
file_size = len(file_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need sys.getsizeof()
here instead, https://stackoverflow.com/questions/65716589/python-size-of-byte-string-in-bytes
from src.utils.file_utils import handle_file_upload | ||
|
||
|
||
def test_handle_file_upload_size(): | ||
with pytest.raises(HTTPException) as err: | ||
handle_file_upload(UploadFile(file=BinaryIO(), size=15*1024*1024)) | ||
large_file_content = b"x" * (15*1024*1024 + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked and the sys.getsizeof(large_file_content)
is 15,728,674
and that is higher than 15*1024*1024 + 1 = 15,728,641
👍 happy with this. Maybe worth adding some of these numbers as comments for future reference
response = await report_agent.create_report( | ||
LLMFile(file_name="test", file=b"Sample text content"), | ||
materiality_topics={"abc": "123"} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might just be GitHub but it looks like )
should be shifted left one tab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've deleted this file, the report agent doesn't do anything that needs testing
backend/src/llm/openai.py
Outdated
if isinstance(file.file, (PathLike, str)): | ||
file_path = Path(file.file) | ||
with file_path.open("rb") as f: | ||
file_bytes = f.read() | ||
elif isinstance(file.file, bytes): | ||
file_bytes = file.file | ||
else: | ||
logger.error(f"Unsupported file type for '{file.file_name}'") | ||
continue | ||
file = await client.files.create(file=(file.file_name, file_bytes), purpose="assistants") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully, it shouldn't be necessary to convert to PathLike
to bytes before calling client.files.create
. Looking at https://github.com/openai/openai-python/blob/main/src/openai/_types.py#L53 I can see that you needed to pass in the file.file_name
as a tuple, I am hoping you can still just do this with the PathLike
files as well? e.g.
if isinstance(file.file, (PathLike, str)): | |
file_path = Path(file.file) | |
with file_path.open("rb") as f: | |
file_bytes = f.read() | |
elif isinstance(file.file, bytes): | |
file_bytes = file.file | |
else: | |
logger.error(f"Unsupported file type for '{file.file_name}'") | |
continue | |
file = await client.files.create(file=(file.file_name, file_bytes), purpose="assistants") | |
file = await client.files.create(file=(file.file_name, file.file), purpose="assistants") |
🤞 that works as I imagine through that f.read()
in there will slow things down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this didn't work for me I get this error:
backend-1 | INFO: 172.18.0.1:47148 - "GET /suggestions HTTP/1.1" 200 OK
backend-1 | INFO: 172.18.0.4:48266 - "GET /health HTTP/1.1" 200 OK
backend-1 | INFO: 172.18.0.4:35206 - "GET /health HTTP/1.1" 200 OK
backend-1 | INFO: Attempting to get session for session_id: a5e943a6-d811-44c3-8dfa-002840207972
backend-1 | INFO: ***************** Session data retrieved from Redis for a5e943a6-d811-44c3-8dfa-002840207972: {}
backend-1 | INFO: upload file type=application/pdf name=Sustainability-Report-2023.pdf size=4467363
backend-1 | INFO: PDF content extracted successfully in 1.75 seconds
backend-1 | INFO: HTTP Request: POST https://api.mistral.ai/v1/chat/completions "HTTP/1.1 200 OK"
backend-1 | INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
backend-1 | INFO: OpenAI response: Finish reason: stop, Content: { "files": [ "Additional-Sector-Guidance-Biotech-and-Pharma.pdf" ]}
backend-1 | INFO: Uploading file 'Additional-Sector-Guidance-Biotech-and-Pharma.pdf' to OpenAI
backend-1 | INFO: Retrying request to /files in 0.475080 seconds
backend-1 | INFO: Retrying request to /files in 0.859916 seconds
backend-1 | ERROR: Connection error.
backend-1 | Traceback (most recent call last):
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1571, in _request
backend-1 | response = await self._client.send(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1674, in send
backend-1 | response = await self._send_handling_auth(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1702, in _send_handling_auth
backend-1 | response = await self._send_handling_redirects(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1739, in _send_handling_redirects
backend-1 | response = await self._send_single_request(request)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1776, in _send_single_request
backend-1 | response = await transport.handle_async_request(request)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 377, in handle_async_request
backend-1 | resp = await self._pool.handle_async_request(req)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
backend-1 | raise exc from None
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request
backend-1 | response = await connection.handle_async_request(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
backend-1 | return await self._connection.handle_async_request(request)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request
backend-1 | raise exc
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 88, in handle_async_request
backend-1 | await self._send_request_body(**kwargs)
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 157, in _send_request_body
backend-1 | async for chunk in request.stream:
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_multipart.py", line 268, in __aiter__
backend-1 | for chunk in self.iter_chunks():
backend-1 | ^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_multipart.py", line 230, in iter_chunks
backend-1 | yield from field.render()
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_multipart.py", line 190, in render
backend-1 | yield from self.render_data()
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_multipart.py", line 183, in render_data
backend-1 | chunk = self.file.read(self.CHUNK_SIZE)
backend-1 | ^^^^^^^^^^^^^^
backend-1 | AttributeError: 'PosixPath' object has no attribute 'read'
backend-1 |
backend-1 | The above exception was the direct cause of the following exception:
backend-1 |
backend-1 | Traceback (most recent call last):
backend-1 | File "/backend/src/api/app.py", line 132, in report
backend-1 | processed_upload = await create_report_from_file(file)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/backend/src/directors/report_director.py", line 29, in create_report_from_file
backend-1 | topics = await get_materiality_agent().list_material_topics(company_name)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/backend/src/agents/materiality_agent.py", line 27, in list_material_topics
backend-1 | materiality_topics = await self.llm.chat_with_file(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/backend/src/llm/openai.py", line 54, in chat_with_file
backend-1 | file_ids = await self.__upload_files(files)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/backend/src/llm/openai.py", line 91, in __upload_files
backend-1 | file = await client.files.create(file=(file.file_name, file.file), purpose="assistants")
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/resources/files.py", line 422, in create
backend-1 | return await self._post(
backend-1 | ^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1838, in post
backend-1 | return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1532, in request
backend-1 | return await self._request(
backend-1 | ^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1595, in _request
backend-1 | return await self._retry_request(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1665, in _retry_request
backend-1 | return await self._request(
backend-1 | ^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1595, in _request
backend-1 | return await self._retry_request(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1665, in _retry_request
backend-1 | return await self._request(
backend-1 | ^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1605, in _request
backend-1 | raise APIConnectionError(request=request) from err
backend-1 | openai.APIConnectionError: Connection error.
backend-1 | 2024-12-18 16:04:35 - src.api.app - ERROR - Connection error.
backend-1 | Traceback (most recent call last):
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1571, in _request
backend-1 | response = await self._client.send(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1674, in send
backend-1 | response = await self._send_handling_auth(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1702, in _send_handling_auth
backend-1 | response = await self._send_handling_redirects(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1739, in _send_handling_redirects
backend-1 | response = await self._send_single_request(request)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_client.py", line 1776, in _send_single_request
backend-1 | response = await transport.handle_async_request(request)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_transports/default.py", line 377, in handle_async_request
backend-1 | resp = await self._pool.handle_async_request(req)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
backend-1 | raise exc from None
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request
backend-1 | response = await connection.handle_async_request(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpcore/_async/connection.py", line 103, in handle_async_request
backend-1 | return await self._connection.handle_async_request(request)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 136, in handle_async_request
backend-1 | raise exc
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 88, in handle_async_request
backend-1 | await self._send_request_body(**kwargs)
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpcore/_async/http11.py", line 157, in _send_request_body
backend-1 | async for chunk in request.stream:
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_multipart.py", line 268, in __aiter__
backend-1 | for chunk in self.iter_chunks():
backend-1 | ^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_multipart.py", line 230, in iter_chunks
backend-1 | yield from field.render()
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_multipart.py", line 190, in render
backend-1 | yield from self.render_data()
backend-1 | File "/usr/local/lib/python3.12/site-packages/httpx/_multipart.py", line 183, in render_data
backend-1 | chunk = self.file.read(self.CHUNK_SIZE)
backend-1 | ^^^^^^^^^^^^^^
backend-1 | AttributeError: 'PosixPath' object has no attribute 'read'
backend-1 |
backend-1 | The above exception was the direct cause of the following exception:
backend-1 |
backend-1 | Traceback (most recent call last):
backend-1 | File "/backend/src/api/app.py", line 132, in report
backend-1 | processed_upload = await create_report_from_file(file)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/backend/src/directors/report_director.py", line 29, in create_report_from_file
backend-1 | topics = await get_materiality_agent().list_material_topics(company_name)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/backend/src/agents/materiality_agent.py", line 27, in list_material_topics
backend-1 | materiality_topics = await self.llm.chat_with_file(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/backend/src/llm/openai.py", line 54, in chat_with_file
backend-1 | file_ids = await self.__upload_files(files)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/backend/src/llm/openai.py", line 91, in __upload_files
backend-1 | file = await client.files.create(file=(file.file_name, file.file), purpose="assistants")
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/resources/files.py", line 422, in create
backend-1 | return await self._post(
backend-1 | return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1532, in request
backend-1 | return await self._request(
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1532, in request
backend-1 | return await self._request(
backend-1 | return await self._request(
backend-1 | ^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1595, in _request
backend-1 | return await self._retry_request(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1665, in _retry_request
backend-1 | return await self._request(
backend-1 | ^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1595, in _request
backend-1 | return await self._retry_request(
backend-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1665, in _retry_request
backend-1 | return await self._request(
backend-1 | ^^^^^^^^^^^^^^^^^^^^
backend-1 | File "/usr/local/lib/python3.12/site-packages/openai/_base_client.py", line 1605, in _request
backend-1 | raise APIConnectionError(request=request) from err
backend-1 | openai.APIConnectionError: Connection error.
backend-1 | INFO: 172.18.0.1:41990 - "POST /report HTTP/1.1" 500 Internal Server Error
backend-1 | INFO: 172.18.0.4:33466 - "GET /health HTTP/1.1" 200 OK
backend-1 | INFO: 172.18.0.4:57280 - "GET /health HTTP/1.1" 200 OK
backend-1 | INFO: 172.18.0.4:43876 - "GET /health HTTP/1.1" 200 OK
backend-1 | INFO: 172.18.0.4:51610 - "GET /health HTTP/1.1" 200 OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's annoying, I've tried this and it seems to work:
if isinstance(file.file, (PathLike, str)): | |
file_path = Path(file.file) | |
with file_path.open("rb") as f: | |
file_bytes = f.read() | |
elif isinstance(file.file, bytes): | |
file_bytes = file.file | |
else: | |
logger.error(f"Unsupported file type for '{file.file_name}'") | |
continue | |
file = await client.files.create(file=(file.file_name, file_bytes), purpose="assistants") | |
file = (file.file_name, file.file) if isinstance(file.file, bytes) else file.file | |
response = await client.files.create(file=file, purpose="assistants") |
I've changed the variable name to response
as file
was being overused
filename=file["filename"], | ||
id=file["uploadId"], | ||
filename=file.file_name, | ||
id=file_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
file = handle_file_upload(upload) | ||
file_stream = await upload.read() | ||
if upload.filename is None: | ||
raise ValueError("Filename cannot be None") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IMladjenovic @mic-smith would this exception be better as a HTTPException so that the error goes straight back to the ui?
ef937df
to
9d4803e
Compare
b06d155
to
103e029
Compare
backend/src/llm/mistral.py
Outdated
) -> str: | ||
try: | ||
for file in files: | ||
file = handle_file_upload(files[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This files[0]
looks wrong. Should this be file = handle_file_upload(file)
?
Would be good not to reassign to file
too
9859720
to
05695ab
Compare
05695ab
to
26a8e87
Compare
Description
Updated the Report agent to work with both OpenAI and Mistral models for generating reports. Added support for Mistral models to utilise the chat with file function using the PdfReader, while the chat with file function for OpenAI models relies on the file_search tool from OpenAI.
Changelog